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A method for mapping a work (e.g. t a picture, a document or me 
like) comprising a large number of bytes into a relatively small bit string to 
facilitate copy protection. The technique creates first string from the work 
with die property that if one modifies the work by some given degree 
(e.g„ through intentional variation or perhaps merely through aging), a 
second string (generated by the Inventive technique) and associated with 
the modified work Is not meaningfully distinct from the first string. When 
applied through the algorithm, works that are similar result in the same 
or similar strings. As a result, an owner of the content sought to be 
protected may apply the algorithm to copies of the work for copy protection 
purposes. 
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METHOD FOR IMAGE PROCESSING TO FACILITATE COPY PROTECTION 
BACKGROUND OF THE INVENTION 
10 Technical Field ' 

The present invention relates generally to securing content against wrongful duplication 
5 and, more particularly, techniques for processing an image into a digital string so that images 
with similar features (e.g., illicit or cropped copies) get mapped into a digital string that is close 
15 to the original string. 

Description of the Related Art 

The proliferation of digitized media and the ease with which digital files can be copied 
20 1 0 especially over public computer networks (e.g., the Internet) has created a need for copy and 

copyright enforcement schemes. Conventional cryptographic systems permit only valid 
keyholders access to encrypted data, but once such data is decrypted there is no way to track its 
reproduction or retransmission. Such schemes thus provide insufficient protection against 
25 unauthorized reproduction of information. It is known in the prior art to provide a so-called 

15 digital ''watermark" on a document to address this problem. A "watermark" is a visible or 
preferably invisible identification code that is permanently embedded in the data and thus 
^ remains present within the data after any decryption process. One example of a digital 

watermark would be a visible "seal" placed over an image to identify the copyright owner. 
However, the watermark might also contain additional information, including the identity of the 
20 purchaser of a particular copy of the material. 
35 Many schemes have been proposed for watermarking digital data. In a known 

watermarking procedure, each copy of a document D is varied slightly so as to look the same to 
the user but also so as to include the identity of the purchaser. The watermark consists of the 
variations that are unique to each copy. The idea behind such schemes is that the watermark 

40 

25 should be hard to remove without destroying the document Thus, a copy of a watermarked 
document should be traceable back to the specific version of the original from which it was 
created. Although certain watermarking techniques provide advantages, there remains a need for 
45 a more general approach to the problems associated with content protection. 

SUMMARY OF THE INVENTION 
30 It is a primary object of the present invention to provide an improved method for securing 

content against wrongful duplication. 

50 
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It is still another object of this invention to provide improved methods for processing an 
image (or other work) into a digital string so that images with similar features (e.g., illicit or 
cropped copies) get mapped into a digital string that is close to the original digital string. 1 

It is another more general object of this invention to protect against the wrongful 
proliferation of digitized media over public computer networks (e.g., the Internet). 

A still further object of this invention is to provide for more efficient and reliable copy 
and copyright enforcement schemes. 

The above as well as additional objects, features, and advantages of the present invention 
will become apparent in the following detailed written description. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The novel features believed characteristic of the invention are set forth in the appended 
claims The invention itself however, as well as a preferred mode of use, further objects and 
advantages thereof, are best understood by reference to the following Detailed Description of an 
illustrative embodiment when read in conjunction with the accompanying Drawings, wherein: 

Figure 1 is a flowchart of the inventive method 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

The present invention is preferably implemented in software (e.g., a series of program 
instructions) executed in a computer. It is assumed that the content to be protected in a picture or 
other image. The invention may be applied to any content (including, without limitation, video, 
graphics, sound, text, etc.) irrespective of its form, however. For purposes of illustration, the 
invention will be described in the context of a picture. The picture is digitized in any convenient 
manner. 

According to the invention, once digitized, the picture is run through a hashing algorithm 
and reduced to approximately thirty (30) bits so that most any other picture that was a small 
modification of the original gets mapped to the same string (with maybe a couple of bits 
different). This is useful in watermarking since this string, then, could be used as part of x_l, 
x_n when one uses H(x_l ... x_n)Do not copy) as an offset watermark. When this approach is 
followed, the watermark used for different pictures is different (which makes it harder for the 
adversary to remove), but it is still possible for someone who knows H to regenerate the 
watermark. In particular, the watermark used for different pictures is different (which makes it 
harder for the adversary to remove) but it is still possible for someone who knows H to 
regenerate the watermark for a correlation test (without access to a database), since even if the 
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adversary changes the picture a little, he cannot change the string x_l, x_n very much; so, it is 
possible to later regenerate the original x_l, x_n by computing the derived values from the 
picture and trying all possibilities that are within a few bits. 4 

Such a mapping function is also useful in Web search engines. For example, assume one 
wants to search for copies of a document/picture on the Web even if such copies are changed a 
little bit. One could use the present invention and compute the string for every candidate picture 
on the Web and see which ones were close to the string for the original. From this information, 
one could check the candidates that match by eye, thereby cutting the search space by a huge 
factor. 

There may also be applications in the domain of organizing pictorial data into similarity 
classes, which could be done according to the digital string produced by the following algorithm. 
Implementation 

With reference now to the flowchart of Figure 1, the inventive mapping scheme has a 
number of steps. 

Step 1 : generate m x n N(0,1) random variables Z_(l ,1 ), Z_(m,n). Store these values 
in an m x n matrix Z*. These values will be used for all documents and will be kept secret. 
This may be done in an off-line process. 

Step 2: convert the document into a digital string y_l, y_n using any standard method. 
If possible, normalize the y_i values so that over the universe of all documents, y_i can be 
thought of as a N(0,1) random variable. (The latter step is not required, but it makes the process 
work better.) Treat the n values as a vector y*. 

Step 3: compute the matrix-vector product v* = Z*xy*. Let vj , v_m denote the 
values of v*. Note that each vj can be thought of as a normal random variable, no matter what 
the y values are. For example, the mean of each v_i is M = (y_l + ... + y_m)/m. 

Step 4: compute two thresholds t' and t n as follows. Select t* so that the probability that 
y_l x u_l + ... + y_m x u_m <f is close to about 1/4 when each u_i is an independent N(0,1) 
random variable. Select t" so that the probability that y_l x u_l + ... + y_m x u_m > t" is close 
to about 1/4. There are many possible variations of this step, eg., one with just one threshold at 
the value t =M, or by adjusting the two threshold values. 

Step 5: For each of i, if v j <t' or if vj >t\ then set b_i = 0. Otherwise, set b_i = 1. 
(Note that each b_i will be equally likely to be 0 or 1.) The values b_l, b_m form the desired 
bit string for the document 
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The mapping function reduces the content (to be protected) from a large number of bytes 
to a string having a small number of bits. Changing just a few values of y* (or all of the values 
of y* by a little bit), however, does not have a meaningful effect on the values of b* that are* 
computed. For example, in order to change the value of bj, the adversary must change enough 
of the y__i's by enough so that the corresponding vj is pushed across a threshold. Without 
knowing the Z* values, this is hard for the adversary to do if he is restricted to making small 
changes in the y_i's (since most of the vj's are not likely to be very near a threshold in the first 
place). In fact, there is a formal mathematical proof that says that the amount by which the 
adversary is required to change the y__i's to effect a significant change in the bj's is the 
maximum possible for any scheme. 

Thus, the mapping function (when applied to copies of the document) is quite useful for 
copy protection purposes. Generalizing, the technique described above creates a first string from 
a picture (or other work) with the property that if one modifies the picture by some given degree 
(e.g., through intentional variation or perhaps merely through aging), a second string (generated 
by the inventive technique) and associated with the modified picture is not meaningfully distinct 
from the first string. The invention can be used to reduce a picture (or other work) comprising a 
large number of bytes into a string having a small number of bits, and thus pictures that are 
similar result in the same or similar strings. As a result an owner of the content sought to be 
protected may apply the algorithm to copies of the work for copy protection purposes. 

As described above, aspects of the present invention pertain to specific "method steps" 
imp lemen table on one or more computers. In an alternate embodiment, the invention may be 
implemented as a computer program product for use with a computer system. Those skilled in 
the art should readily appreciate that programs defining the functions of the present invention 
can be delivered to a computer in many forms, which include, but are not limited to: 

(a) information permanently stored on non-writable storage media (e.g. read only memory 
devices within a computer such as ROM or optical disks readable by CD-ROM drive); 

(b) information alterably stored on writable storage media (e.g., floppy disks within a diskette 
drive or a hard disk drive); or (c) information conveyed to a computer through communication 
media, such as through a computer or telephone or other network. It should be understood, 
therefore, that such media, when carrying computer readable instructions that direct the method 
functions of the present invention, represent alternate embodiments of the present invention. 
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5 



In addition, although the various methods described are conveniently implemented in a 
general purpose computer selectively activated or reconfigured by software, one of ordinary skill 
in the art would also recognize that such methods may be carried out in hardware, in firmware, 
or in more specialized apparatus (e.g., a secure chip) constructed to perform the required method 



5 steps. 



While the invention has been particularly shown and described with reference to a 
preferred embodiment, it will be understood by those skilled in the art that various changes in 
form and detail may be made therein without departing from the spirit and scope of the 
invention. 
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CLAIMS 

1. A method for mapping a document comprising a large number of bytes into a hit 
string to facilitate copy protection of the document, comprising the steps of: * 

generating m x n N(0 t l) random variables Z_(l f 1), Z_(m,n) and storing the random 
variables in an m x n matrix Z*; 

processing the document into a digital string y_l, y_n; 

normalizing the y_i values to generate a vector y*; 

computing a matrix-vector product v* = Z*xy* where vj, .... v_m denote the values of 

v*; 

selecting first and second thresholds 1' and t" where t' is selected so that the probability 
that y_l x u_l + ... + y_m x urn <f is close to about 1/4 when each u_i is an independent N(0,1) 
random variable, and where t" is selected so that the probability that y_l x u_l + ... + y_m x u_m 
> t" is close to about 1/4; 

each i, determining if v_i <t* or if vj >t tt ; and 

if v_i is less than t' or if v_i is greater than t", then set b_i = 0; 

if vj is not less than t' or if vj is not greater than t'\ set b j = 1 ; and 

concatenating the values b_J, b_m into the bit string. 
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